smoothed analysis
Smoothed Analysis of Sequential Probability Assignment
We initiate the study of smoothed analysis for the sequential probability assignment problem with contexts. We study information-theoretically optimal minmax rates as well as a framework for algorithmic reduction involving the maximum likelihood estimator oracle. Our approach establishes a general-purpose reduction from minimax rates for sequential probability assignment for smoothed adversaries to minimax rates for transductive learning. This leads to optimal (logarithmic) fast rates for parametric classes and classes with finite VC dimension. On the algorithmic front, we develop an algorithm that efficiently taps into the MLE oracle, for general classes of functions. We show that under general conditions this algorithmic approach yields sublinear regret.
Smoothed Analysis of Online and Differentially Private Learning
Practical and pervasive needs for robustness and privacy in algorithms have inspired the design of online adversarial and differentially private learning algorithms. The primary quantity that characterizes learnability in these settings is the Littlestone dimension of the class of hypotheses [Ben-David et al., 2009, Alon et al., 2019]. This characterization is often interpreted as an impossibility result because classes such as linear thresholds and neural networks have infinite Littlestone dimension. In this paper, we apply the framework of smoothed analysis [Spielman and Teng, 2004], in which adversarially chosen inputs are perturbed slightly by nature. We show that fundamentally stronger regret and error guarantees are possible with smoothed adversaries than with worst-case adversaries. In particular, we obtain regret and privacy error bounds that depend only on the VC dimension and the bracketing number of a hypothesis class, and on the magnitudes of the perturbations.
Smoothed analysis of the low-rank approach for smooth semidefinite programs
We consider semidefinite programs (SDPs) of size $n$ with equality constraints. In order to overcome scalability issues, Burer and Monteiro proposed a factorized approach based on optimizing over a matrix $Y$ of size $n\times k$ such that $X=YY^*$ is the SDP variable. The advantages of such formulation are twofold: the dimension of the optimization variable is reduced, and positive semidefiniteness is naturally enforced. However, optimization in $Y$ is non-convex. In prior work, it has been shown that, when the constraints on the factorized variable regularly define a smooth manifold, provided $k$ is large enough, for almost all cost matrices, all second-order stationary points (SOSPs) are optimal. Importantly, in practice, one can only compute points which approximately satisfy necessary optimality conditions, leading to the question: are such points also approximately optimal? To this end, and under similar assumptions, we use smoothed analysis to show that approximate SOSPs for a randomly perturbed objective function are approximate global optima, with $k$ scaling like the square root of the number of constraints (up to log factors).
A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem
Bandit learning is characterized by the tension between long-term exploration and short-term exploitation. However, as has recently been noted, in settings in which the choices of the learning algorithm correspond to important decisions about individual people (such as criminal recidivism prediction, lending, and sequential drug trials), exploration corresponds to explicitly sacrificing the well-being of one individual for the potential future benefit of others. In such settings, one might like to run a ``greedy'' algorithm, which always makes the optimal decision for the individuals at hand --- but doing this can result in a catastrophic failure to learn. In this paper, we consider the linear contextual bandit problem and revisit the performance of the greedy algorithm. We give a smoothed analysis, showing that even when contexts may be chosen by an adversary, small perturbations of the adversary's choices suffice for the algorithm to achieve ``no regret'', perhaps (depending on the specifics of the setting) with a constant amount of initial training data. This suggests that in slightly perturbed environments, exploration and exploitation need not be in conflict in the linear setting.
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
Convergence of \text{log}(1/\epsilon) for Gradient-Based Algorithms in Zero-Sum Games without the Condition Number: A Smoothed Analysis
Gradient-based algorithms have shown great promise in solving large (two-player) zero-sum games. However, their success has been mostly confined to the low-precision regime since the number of iterations grows polynomially in 1/\epsilon, where \epsilon 0 is the duality gap. While it has been well-documented that linear convergence---an iteration complexity scaling as \text{log}(1/\epsilon) ---can be attained even with gradient-based algorithms, that comes at the cost of introducing a dependency on certain condition number-like quantities which can be exponentially large in the description of the game. To address this shortcoming, we examine the iteration complexity of several gradient-based algorithms in the celebrated framework of smoothed analysis, and we show that they have polynomial smoothed complexity, in that their number of iterations grows as a polynomial in the dimensions of the game, \text{log}(1/\epsilon), and 1/\sigma, where \sigma measures the magnitude of the smoothing perturbation. Our result applies to optimistic gradient and extra-gradient descent/ascent, as well as a certain iterative variant of Nesterov's smoothing technique.
Review for NeurIPS paper: The Smoothed Possibility of Social Choice
Additional Feedback: Major comments: 1) Smoothed analysis: Calling your analysis "smoothed analysis" is a bit confusing. Smoothed analysis would take worst-case over profiles, and then expectation over noise added. But as you say in your related work, you're taking worst-case over distributions coming from a family. Such analysis was already done in the past. For example, consider work that analyzed the probability of Condorcet's paradox under any distribution from the Mallows family.
Review for NeurIPS paper: Smoothed Analysis of Online and Differentially Private Learning
Summary and Contributions: This paper studies the very interesting question of moving beyond worst-case adversarial bounds for private/online learning. This is of particular relevance since the question of the equivalence between private - online learning was recently resolved by Bun et al. 20, and this is a logical next step in this line of research. Rather than consider a worst case adversaries, the authors consider adversaries constrained to play instances from alpha-smooth distributions (hence smoothed analysis). Although smooth adversaries against online or private learning have been studied in specific instances, this is the first work to consider this problem in full generality. Results: -They show that online learning against smooth adversaries can be characterized by the bracketing number of the hypothesis class - And that private learning against smoothed adversaries can be characterized by the VC dimension of the hypothesis class.
Smoothed Analysis of Sequential Probability Assignment
We initiate the study of smoothed analysis for the sequential probability assignment problem with contexts. We study information-theoretically optimal minmax rates as well as a framework for algorithmic reduction involving the maximum likelihood estimator oracle. Our approach establishes a general-purpose reduction from minimax rates for sequential probability assignment for smoothed adversaries to minimax rates for transductive learning. This leads to optimal (logarithmic) fast rates for parametric classes and classes with finite VC dimension. On the algorithmic front, we develop an algorithm that efficiently taps into the MLE oracle, for general classes of functions.
Fixed Point Computation: Beating Brute Force with Smoothed Analysis
Attias, Idan, Dagan, Yuval, Daskalakis, Constantinos, Yao, Rui, Zampetakis, Manolis
We propose a new algorithm that finds an $\varepsilon$-approximate fixed point of a smooth function from the $n$-dimensional $\ell_2$ unit ball to itself. We use the general framework of finding approximate solutions to a variational inequality, a problem that subsumes fixed point computation and the computation of a Nash Equilibrium. The algorithm's runtime is bounded by $e^{O(n)}/\varepsilon$, under the smoothed-analysis framework. This is the first known algorithm in such a generality whose runtime is faster than $(1/\varepsilon)^{O(n)}$, which is a time that suffices for an exhaustive search. We complement this result with a lower bound of $e^{\Omega(n)}$ on the query complexity for finding an $O(1)$-approximate fixed point on the unit ball, which holds even in the smoothed-analysis model, yet without the assumption that the function is smooth. Existing lower bounds are only known for the hypercube, and adapting them to the ball does not give non-trivial results even for finding $O(1/\sqrt{n})$-approximate fixed points.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Michigan (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Africa > Sudan (0.04)
- Research Report (0.63)
- Workflow (0.45)
- Information Technology > Game Theory (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.45)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)